Assessing Inter-Modular Error Propagation in Distributed Software

نویسندگان

  • Arshad Jhumka
  • Martin Hiller
  • Neeraj Suri
چکیده

With the functionality of most embedded systems based on software (SW), interactions amongst SW modules arise, resulting in error propagation across SW them. During SW development, it would be helpful to have a framework that clearly demonstrates the error propagation and containment capabilities of the di erent SW components. In this paper, we assess the impact of inter-modular error propagation. Adopting a white-box SW approach, we make the following contributions: (a) we study and characterize the error propagation process and derive a set of metrics that quantitatively represents the inter-modular SW interactions, (b) we use a real embedded target system used in an aircraft arrestment system to perform fault-injection experiments to obtain experimental values for the metrics proposed, (c) we show how the set of metrics can be used to obtain the required analytical framework for error propagation analysis. We nd that the derived analytical framework establishes a very close correlation between the analytical and experimental values obtained. The intent is to use this framework to be able to systematically develop SW such that inter-modular error propagation is reduced by design. 1 Motivation and Approach With SW driving dependable system designs, systems are invariably built around sets of cooperating SW modules (tasks) that execute under speci ed resource and timing constraints. Traditionally, at the hardware (HW) level, classical techniques have involved many variations of replication of computing nodes, and consequently, replicating the software resident on them. With SW modules sharing common resources, intricate interacSupported in part by Saab endowment, TFR, NSF Career CCR 9896321, Volvo Research Foundation (FFP-DCN) & NUTEK (1P21-97-4745). tions between SW modules allow propagation of SW level data errors, which need to be corrected as early as possible to ensure correct delivery of services. Hence, the need to de ne a framework that demonstrates the error containment capabilities of SW modules. To constrain error propagation in SW, all intermodular interactions are desired to be e ected in a prescribed way, such that the overall distributed SW is composed of Error Containment Modules (ECMs) { analogous to Fault Containment Regions (FCRs) in HW { at di erent abstraction levels. To overcome a transient fault at the node level, di erent techniques, such as replication or Error Detection and Recovery Mechanisms (EDMs and ERMs) may be used. However, knowing which vulnerable modules to replicate or equip with EDMs and ERMs is of primary importance. Thus, quantitative information is needed to help determine these parameters. Intuitively, SW modules that allow errors to propagate are candidates for replication or to be equipped with EDMs and ERMs, such that the error(s) can be contained and/or corrected. To assess the impact of error propagation, we adopt a white-box perspective of SW and de ne a basic set of metrics, namely inter-modular in uence and separation, introduced in [14]. However, these metrics do not, by themselves, aid identi cation of vulnerable modules in the system. Thus, we augment the above set with complementary metrics, namely Error Transmission Probability and Error Transparency that help characterize error propagation properties of SW modules and aid identi cation of vulnerable modules. In uence is de ned as the probability of a module directly inuencing another module, i.e., when no other module is considered while separation is de ned as the probability of a module not in uencing another one White-box SW implies that the SW module properties and structure are fully known and modi able.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Error Propagation Analysis of Software Architecture Specifications

As software architecture is becoming an important asset in the development of software systems, the study of quantitative assessment of software architectures is gaining importance due to its role in assessing their quality. Error propagation between software system components is a quantitative factor that reflects on the reliability of a software product. We introduce a framework for experimen...

متن کامل

On the Placement of Software Mechanisms for Detection of Data Errors

An important aspect in the development of dependable software is to decide where to locate mechanisms for efficient error detection and recovery. We present a comparison between two methods for selecting locations for error detection mechanisms, in this case executable assertions (EA’s), in black-box modular software. Our results show that by placing EA’s based on error propagation analysis one...

متن کامل

Developing error handling software for object-oriented geographical information

The inclusion of error handling capabilities within geographical information systems (GIS) is seen by many as crucial to the future commercial and legal stability of the technology. This thesis describes the analysis, design, implementation and use of a GIS able to handle both geographical information (GI) and the error associated with that GI. The first stage of this process is the development...

متن کامل

Software Profiling for Designing Dependable Software

This paper describes a method for profiling modular software by analyzing the propagation and effect of data errors. A framework of different metrics, based on the concept of error permeability enables the profiling of vulnerabilities and hot-spots, specifically i) the modules and signals that are most likely exposed to propagating errors, and ii) the modules and signals which, when subjected t...

متن کامل

Fingerprinting Summarizes the History of Internal Processor State Updates into a Cryptographic Signature. the Processors in a Dual Modular Redundant Pair Periodically Exchange and Compare Fingerprints to Corroborate

Recent studies suggest that the softerror rate in microprocessor logic is likely to become a serious reliability concern by 2010. Detecting soft errors in the processor’s core logic presents a new challenge beyond what error-detecting and correcting codes can handle. Currently, commercial microprocessor systems that require an assurance of reliability employ an error-detection scheme based on d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001